Hadoop’s Adolescence: A Comparative Workload Analysis from Three Research Clusters

نویسندگان

  • Kai Ren
  • YongChul Kwon
  • Magdalena Balazinska
  • Bill Howe
چکیده

We analyze Hadoop workloads from three different research clusters from an application-level perspective, with two goals: (1) explore new issues in application patterns and user behavior and (2) understand key performance challenges related to IO and load balance. Our analysis suggests that Hadoop usage is still in its adolescence. We see underuse of Hadoop features, extensions, and tools as well as significant opportunities for optimization. We see significant diversity in application styles, including some “interactive” workloads, motivating new tools in the ecosystem. We find that some conventional approaches to improving performance are not especially effective and suggest some alternatives. Overall, we find significant opportunity for simplifying the use and optimization of Hadoop, and make recommendations for future research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hadoop’s Adolescence An analysis of Hadoop usage in scientific workloads

We analyze Hadoop workloads from three di↵erent research clusters from a user-centric perspective. The goal is to better understand data scientists’ use of the system and how well the use of the system matches its design. Our analysis suggests that Hadoop usage is still in its adolescence. We see underuse of Hadoop features, extensions, and tools. We see significant diversity in resource usage ...

متن کامل

Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop (CMU-PDL-09-103)

Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified controland data-flow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop j...

متن کامل

1Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop

Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified controland dataflow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop jo...

متن کامل

Strength of Co-authorship Ties in Clusters: A Comparative Analysis

We analyze the strength of ties through three different clustering algorithms applied to co-authorship social networks from three different research areas. This study reveals if tie strength metrics can be used to evaluate clusters quality. We obtain different results for each algorithm and observe that Markov cluster algorithm provides the best results for co-authorship social networks. Also, ...

متن کامل

Trajectories of Low Back Pain From Adolescence to Young Adulthood.

OBJECTIVE Despite the high prevalence and burden of low back pain (LBP), understanding of its course during the transition from adolescence to adulthood is limited. The aim of this study was to identify and describe trajectories of LBP and its impact among a general population sample followed from adolescence to young adulthood. METHODS Data from followup assessments at years 17, 20, and 22 o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012